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ABSTRACT 


...... 
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In  many  experimental  situations  (particularly  in  computer  simulation 
studies)  a  large  number  of  potentially  important  factors  exist.  Because 
of  time  and  budget  limitations,  it  is  imperative  to  screen  these  factors 
in  order  to  identify  a  subset  which  should  be  subjected  to  more  detailed 
examination.  This  paper  evaluates  the  performance  of  a  factor  screening 
technique  which  has  been  proposed  for  use  when  it  is  known  that  there  is 
at  most  one  active  factor  (i.e.,  a  factor  which  has  an  effect  on  the 
response  of  interest).  Performance  evaluation  reveals  that  the  existence 
of  even  a  relatively  small  amount  of  random  error  renders  essentially 
useless  a  procedure  which  performs  well  in  the  deterministic  case. 
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I.  INTRODUCTION 
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In  many  experimental  situations  (particularly  in  computer  simulation 
studies)  a  large  number  of  potentially  important  factors  exist.  Because 
of  time  and  budget  limitations,  it  is  imperative  to  screen  these  factors 
in  order  to  identify  a  subset  which  should  be  subjected  to  more  detailed 
examination.  In  general,  the  screening  situation  is  one  in  which  only  a 
small  number  of  the  factors  are  actually  active  (i.e.,  have  an  effect  on 
the  response  of  interest).  A  number  of  factor  screening  approaches  have 
been  suggested.  [See  Kleijnen  (1975)  for  a  summary.] 

This  paper  will  concentrate  on  the  situation  in  which  it  is  known 
that  there  is  at  most  one  active  factor  in  the  set  of  factors  to  be 
screened.  In  a  sense,  the  problem  is  somewhat  analogous  to  searching  for 
one  possible  needle  in  a  haystack. 

The  first-order  model 
K 

y  -  e0  +  ie1xi  +  e 

will  be  assumed,  where 

(1)  y  is  the  response  of  interest 

(2)  x^  is  a  factor  at  two  levels  (+1  and  -1) 

2 

(3)  e  is  a  random  error  term  with  e  ~  N(0,  0  ) 

(4)  K  is  the  number  of  factors  to  be  screened 

(5a)  Bj  ■  A  +  0  and  B^  ■  0(i  +  J)  for  unknown  j  if  the  factor 
is  active 

(5b)  8^  ■  0  (i  ■  1,...,  K)  if  no  factor  is  active. 
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For  this  situation  Ott  and  Wehrfritz  (1972)  have  developed  a  procedure 
for  examining  up  to  K  ■  2^  -  1  factors  in  N  runs  under  the  additional 
assumptions  that  o  -  0  (no  random  error)  and  A  >  0  (direction  of  active 
factor  effect  known).  Their  procedure  is  concerned  only  with  the  selection 
of  the  active  factor,  and  not  with  determining  the  value  of  A. 

This  paper  will  concentrate  on  a  slightly  revised  Oi.t-Wehrfritz  pro¬ 
cedure.  Specifically,  the  performance  of  this  revised  procedure  will  be 
examined  when  the  two  assumptions  a  ■  0  and  A  >  0  are  not  made.  To  com¬ 
pensate  for  the  lack  of  information  about  the  direction  of  active  factor 

effect,  an  extra  run  will  be  required.  Thus,  this  revised  Ott-Wehrfritz 

N  —  1 

procedure  may  be  used  to  examine  up  to  K  *  2  -  1  factors  in  N  runs. 

Note  the  number  of  runs  that  can  be  saved  by  using  this  procedure. 

For  example,  a  total  of  127  factors  can  be  screened  using  only  eight  runs. 
Of  course,  a  savings  in  number  of  runs  is  truly  a  savings  only  if  the 
procedure  performs  well.  An  evaluation  of  that  performance  is  the  main 
topic  of  this  paper. 


-2- 


II.  APPLICATION  OF  THE  REVISED  OTT-WEHRPRITZ  PROCEDURE 

An  easy  method  of  generating  an  N-run  screening  design  based  on 
the  Ott-Wehrfritz  procedure  when  not  assuming  A  >  0  Is  given  in  the 
following  steps: 

(Gl)  Write  (to  the  same  number  of  places)  the  first  2*  “  ^  -  1 

positive  binary  numbers  in  ascending  order.  Include  leading 
zeros  so  that  each  number  has  at  least  one  leading  zero. 

(G2)  Replace  the  zeros  by  -l's. 

(G3)  Let  the  resulting  set  of  +l's  and  -l's  corresponding  to  each 

binary  number  constitute  a  column  in  the  design  matrix,  keeping 
the  same  order. 

As  an  example,  consider  the  case  N  *  4.  Corresponding  steps  Gl 
through  G3  and  the  resulting  screening  design  are  indicated  in  Figure  1. 

It  should  be  noted  that  this  design  is  the  same  one  given  by  Ott  and 
Wehrfritz  (1972),  but  with  permuted  columns  and  an  additional  first  row  of 
-l's  (because  of  the  lack  of  the  assumption  that  A  >  0). 

A.  ANALYSIS  IN  THE  DETERMINISTIC  CASE 

In  the  deterministic  case  (o  ■  0),  the  analysis  may  be  summarized 
in  the  following  four  steps: 

(Dl)  Observe  the  N  responses  y^ 
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Figure  1:  Generation  of  a  Screening  Design  for  the  Revised 


(D2)  Define,  for  i  -  1 .  N, 

Jo  If  -  yj 

»  2 

(D3)  Calculate  the  number  L  ■  1(2  )  w 

2  1 

(D4)  If  L  ■  0,  conclude  that  there  is  no  active  factor.  Otherwise, 
select  the  L—  factor  as  the  active  factor. 

Suppose  that  for  the  example  considered,  the  observed  responses 
were  y^  **  4.0,  y^  **  -2.0,  y^  ■  4.0,  and  y^  ■  -2.0  (step  Dl).  Step  D2 
produces  w^  ■  1,  ■  0,  and  w^  =  1,  while  steps  D3  and  D4  result  in  the 

selection  of  factor  $ 5  as  the  active  factor.  Further,  the  first-order 
model  is  given  by 

y  ■  1.0  -  3.0  Xg 

B.  ANALYSIS  IN  THE  NONDETERMINI STIC  CASE 

In  the  deterministic  case,  the  observed  yjs  assume  at  most  two 
values.  In  the  nondeterministic  case  (a  >  0),  however,  all  of  the  yjs 
assume  different  values  with  probability  one.  Because  of  this,  analysis 
steps  Dl  through  D4  for  the  deterministic  case  cannot  be  used.  Instead, 
the  analysis  can  be  based  on  the  statistic  BSS/WSS,  the  ratio  of  the 
between  sum  of  squares  to  the  within  sum  of  squares.  The  corresponding 
analysis  steps  are: 

(Nl)  Order  the  observed  y^s  as  y^  <  y^  <  ...  <  y^ 

(N2)  Consider  N  -  1  partitions  of  these  observations  into  two 
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It  should  be  noted  that  for  the  deterministic  case,  the  preceding  steps  yield 
results  equivalent  to  those  produced  by  steps  Dl  through  D4.  However,  they 
are  more  cumbersome. 


III.  PERFORMANCE  EVALUATION 


The  central  topic  of  this  paper  is  evaluation  of  the  performance 
of  the  revised  Ott-Wehrfritz  procedure  in  the  nondeterministic  case. 

In  this  case,  of  course,  a  factor  may  be  identified  as  active  when,  in 
fact,  all  factors  are  inactive.  The  probability  of  this  Type  I  error 
may  be  controlled  by  selection  of  the  appropriate  critical  value. 

On  the  other  hand,  if  there  is  an  active  factor,  the  analysis  may 
not  identify  this  factor  because  of  random  error.  Two  different  errors 
are  possible.  Either 

(1)  the  value  of  the  statistic  is  too  small  to  conclude  that  there 
is  an  active  factor, 

or  (2)  the  wrong  factor  is  identified  as  active. 

At  this  juncture,  define  the  two  events  CD  and  A,  where  CD  is  the 
event  that  the  correct  decision  is  made  and  A  is  the  event  that  there 
is  an  active  factor.  In  this  section,  the  performance  of  the  revised 
Ott-Wehrf ritz  procedure  is  evaluated  by  examining,  for  a  =  .05,  .25,  and 
1.00,  the  value  of  P(CD|a),  the  probability  of  correctly  selecting  the 
active  factor  when  one  exists. 

It  should  be  noted  that  if  a  =  1.00,  an  implicit  assumption  is  made 
that  one  active  factor  is  definitely  present.  Thus,  for  a  =  1.00  it  will 
never  be  concluded  that  there  is  no  active  factor. 

If  there  is  one  active  factor,  the  observations  v^,...,  y^  will 

2  2 

include  n  observations  from  N(A,  a  )  and  N  -  n  from  N(-A,  a  ),  assuming 
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N  •“  1 

without  loss  of  generality  that  8q  *»  0.  Of  the  2  -  1  possible 

factors  to  be  screened  in  the  N  runs,  the  corresponding  columns  (based 
on  the  binary  numbers)  in  the  design  matrix  are  such  that  C”  columns 
consist  of  n  +l's  and  (N  -  n)  -l's  for  n*l, ...»  N-l.  If  An  denotes 
the  event  that  the  column  corresponding  to  the  active  factor  contains 
exactly  n  +l's,  then  the  probability  of  interest  may  be  written  as: 

N-l 

p(cd|a)  -  z  p^Ia^p^) 

-  ^^(CDlA  )  •  [cN  ■  1/  (2N  •  1  -  1)].  (1) 

l  n  n 

As  might  be  expected,  P(CD|An)  is  unwieldy  to  evaluate  analytically 

2 

since  it  represents  the  probability  that  for  n  observations  from  N(A,  a  ) 
and  N  -  n  from  N(-A,  o^),  the  maximum  BSS/WSS  ratio  is  produced  by  the 
corresponding  partition  of  the  observations.  Thus,  Monte  Carlo  evaluation 
was  used. 

A.  MONTE  CARLO  PROCEDURES 

In  each  case  considered,  it  is  assumed  that  for  an  N  run  design  a 

total  of  2N  ”  1  -  1  factors  are  to  be  examined.  To  evaluate  P(CD|A),  the 

probability  of  interest,  a  Monte  Carlo  procedure  was  used  to  estimate 

P (CD | Ah)  for  n=l,...,N-l.  In  general,  the  procedure  used  randomly 

2  2 

generated  x  , ...,x^  _  n  from  N(-A,  o*)  and  y^,...,  yn  from  N(A,  a  ).  The 
x's  were  ordered  as  x(i)<,,,<x(N  -  n)  ant*  y' s  were  ordered  as 

*(l)<  —  <y(»)* 

If  X(N  -  n)  <y(l)‘  t*ie  N-l  possible  partitions  of  the  ordered 
observations  were  evaluated  to  determine  whether  the  two  groups  (x^,..*^^  _  ) 
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and  (y^,...,  yfl)  resulted  In  the  mavlimim  BSS/WSS  ratio,  and  if  so,  whether 

the  resulting  value  was  greater  than  the  upper  .05  and  .25  points  of  the 

null  distribution.  (If  x.  >  y,n,  of  course,  the  two  groups  of 

tN  —  n)  v.1) 

Interest  do  not  result  In  the  maximum  value  of  BSS/WSS.) 

The  resulting  estimate  of  P(CD|A)  was  obtained  by  substituting  the 
estimates  of  P(CD|An)  Into  equation  (1).  It  should  be  noted  that,  because 
of  symmetry,  P(CD|An>  ■  P(CD|A^  _  q).  Therefore,  this  fact  permitted 
estimation  of  P(CD|A)  with  a  reduced  total  number  of  Iterations.  For  each 
probability  P(CD|An)  estimated  by  the  Monte  Carlo  procedure,  625  iterations 
were  used.  This  resulted  in  estimated  standard  errors  ranging  between  0.000 
and  0.014. 

To  provide  a  check  on  the  Monte  Carlo  procedure,  P(Cd|a)  was  calculated 
by  Monte  Carlo  for  the  (degenerate)  case  N  =  3,  A**0,  a  ■  1.  In  this  case, 
it  cau  be  shown  analytically  using  symmetry  arguments  that 

P(CD|a)  -  P(CD|A1)  -  P(CD|A2)  -  1/6. 

Based  on  1000  iterations,  this  probability  was  estimated  to  be  .152,  well 
within  the  95%  confidence  interval  of  (.144,  .190). 


B.  RESULTS 


The  Monte  Carlo  procedure  was  used  to  obtain  estimates  of  P(CD|A)  for 
N  *  5,  6,  7,  8,  9  and  a  ■  r|n|  for  r  ■  0,  .1,  .2,  .3,  .4,  .5  where  A  is  the 
coefficient  of  the  active  factor.  Figures  3,  4,  and  5  present,  for  various 
values  of  o,  estimates  of  P(CD|A)  corresponding  to  a  *  .05,  .25,  and  1.00 
respectively.  As  previously  noted,  the  maximum  estimated  standard  error 
for  any  table  entry  is  0.014. 
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[Maximum  Estimated  Standard  Error  Is  0.013.) 
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(Maximum  Estimated  Standard  Error  Is 


Again  It  should  be  pointed  out  that  the  results  in  Figure  5  for 
a  -  1.00  may  be  interpreted  as  the  chances  of  identifying  the  correct 
factor  when  it  is  known  that  there  is  exactly  one  active  factor  present. 

In  other  words,  since  a  critical  value  of  zero  corresponds  to  a  *  1.00, 
a  correct  decision  is  made  if  the  active  factor  corresponds  to  the 
largest  ratio,  R^,  regardless  of  its  magnitude.  From  Figure  5,  it  can 
be  seen  that  the  probability  that  the  active  factor  does  correspond  to 
decreases  as  N  increases,  as  intuition  would  suggest. 

However,  for  the  other  values  of  a  considered,  this  probability  must 
be  multiplied  by  the  probability  that  R^  is  greater  than  the  appropriate 
critical  value.  This  latter  probability,  which  increases  with  increasing 
N,  offsets  the  decreasing  nature  of  the  former  probability  to  produce,  for 
a  »  .05  and  a  -  .25,  the  resulting  probability  values  which  also  Increase 
with  increasing  N. 
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IV.  SUMMARY  AND  DISCUSSION 


It  must  be  emphasized  that  If  the  Ott-Wehrfritz  procedure  (or 
Its  revised  version  given  here)  is  to  be  used,  the  assumption  that 
there  is  at  most  one  factor  is  critical.  If  this  assumption  is  not 
true,  many  possible  explanations  of  the  data  exist  other  than  that  pro¬ 
vided  by  applying  the  procedure.  Although  this  assumption  may  appear 
unrealistic,  there  may  be  situations  where  it  is  thought,  a  priori, 
that  there  is  an  extremely  small  probability  (say  less  than  1/3K)  that 
any  given  factor  in  the  K  factors  is  active.  Then,  the  chances  of 
encountering  two  or  more  active  factors  may  be  negligible. 

Nonetheless,  as  Figures  3  through  3  show,  the  existence  of  even  a 
relatively  small  amount  of  random  error  renders  essentially  useless  a 
procedure  which  performs  well  in  the  deterministic  case.  In  summary, 
the  revised  Ott-Wehrfritz  procedure  may  provide  a  reasonable  approach 
if  both  of  the  following  conditions  hold: 

(1)  There  is  at  most  one  active  factor 

and  (2)  a  <  . 2A,  where  A  is  the  coefficient  of  the  active  factor  if 
there  is  one. 

Otherwise,  adopting  this  procedure  would  tend  to  cause  more  grief  than 
benefit. 
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